Aligning data in a wide, high-speed, source synchronous parallel link

ABSTRACT

A source-synchronous parallel interface divides a wide data bus into clock-groups including a sub-group of the data lines and a clock line carrying a copy of the transmit clock. The traces in a clock-group are located physically close together to minimize skew between the signals carried on the traces of the clock-group. Deskew logic on the receiver compensates for skew between received clock-group signals.

RELATED APPLICATIONS

This application is a continuation in part of the commonly-assignedUnited States patent application entitled HIGH-SPEED MEMORY FOR USE INNETWORKING SYSTEMS, filed Jun. 16, 2003, Ser. No. 10/462,866, which ishereby incorporated by reference for all purposes.

BACKGROUND OF THE INVENTION

The source-synchronous bus has been used to increase the speed of busesin many designs. Data and clock are sourced from the same device on thebus. The receiving device uses the clock from the bus to sample the dataon the bus. Since the clock and data are driven and distributedsimilarly, they have similar delays and hence such buses can be runfaster than buses using other clocking schemes.

At higher speed, being able to drive a clock becomes challengingespecially when the data pins are driven and sampled on both edges ofthe clock. This is referred to as double-data rate or DDR.

One of the limitations on speed derives from the fact that as the numberof data pins gets large, the skew between those pins increases, whereClock Skew is the variation in the transition point of a clock signaldue to delay in the propagation path. Since all pins need to be sampledwith the same clock, clock skew limits the speed of the bus. In DDR3,SRAMs, and in fast packet forwarding ASICs, this limitation is overcomeby limiting the number of data pins associated with a clock pin. Forwider data buses, multiple copies of source-synchronous clocks are used.But still the skew between copies of clocks has to be limited to muchless than the clock period in order to align the data sampled withdifferent copies of clocks.

Accordingly, new parallel interfaces need to be developed that allowhigh speed data transfer between Devices with a large number of pins.

BRIEF SUMMARY OF THE INVENTION

One embodiment of the invention allows the clock period of the transmitand receive core clocks to get smaller than the skew between the copiesof source-synchronous clock. The maximum frequency of operation of thelink is thereby increased to the limit reachable for sampling a smallnumber of data pins with a source-synchronous clock received on a pair(clock high and clock low) of clock pins. There is no limit imposedbecause of skew between multiple copies of source-synchronous clocks.

In another embodiment of the invention, for each copy ofsource-synchronous clock, data is written into a receive-data FIFO inthe receiver and data is read from all these FIFOs using a single coreclock. An initialization protocol is used to align data between multipleFIFOs. The initialization protocol and the receive-data FIFOs can alsobe used to align data coming from multiple devices connected in parallelto the same receiving device.

In another embodiment of the invention, both the transmitting andreceiving devices use a PLL (phase-locked loop) to phase-align theirinternal core-clocks with a common external reference clock. This limitsthe jitter and wander of the source-synchronous clock with respect tothe receiver core-clock and that in turn reduces the depth of thereceive-data FIFOs.

In another embodiment of the invention, the transmitting device may senddata in a single clock from one or more logical buses in its core-clockdomain over multiple source synchronous links. The receive-data FIFOsand the deskew protocol align the data from the logical bus(es) in thecore-clock domain of the receiving device.

Other features and advantages of the invention will be apparent in viewof the following detailed description and appended figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional timing diagram of an embodiment of the invention;

FIG. 2 depicts the transmitter interface model of an embodiment of theinvention;

FIG. 3 depicts the receiver interface model of an embodiment of theinvention;

FIG. 4 depicts the clock distributions and clock domains of anembodiment of the invention;

FIG. 5 depicts an embodiment of the invention having a link using twoclock-groups;

FIG. 6 depicts a receiving device model for multiple clock copies of anembodiment of the invention;

FIG. 7 depicts an embodiment of the invention having deskew logic in thereceive interface model;

FIG. 8 depicts a FIFO of an embodiment of the invention;

FIG. 9 depicts two transmitters coupled in parallel to a receiver; and

FIG. 10 depicts a transmitter and receiver coupled by independentlogical buses divided into clock groups.

DETAILED DESCRIPTION OF THE INVENTION

Reference will now be made in detail to various embodiments of theinvention. Examples of these embodiments are illustrated in theaccompanying drawings. While the invention will be described inconjunction with these embodiments, it will be understood that it is notintended to limit the invention to any embodiment. On the contrary, itis intended to cover alternatives, modifications, and equivalents as maybe included within the spirit and scope of the invention as defined bythe appended claims. In the following description, numerous specificdetails are set forth in order to provide a thorough understanding ofthe various embodiments. However, the present invention may be practicedwithout some or all of these specific details. In other instances, wellknown process operations have not been described in detail in order notto unnecessarily obscure the present invention.

Several embodiments will now be described to implement a high-speedsource-synchronous parallel link (SSPL) for transferring data at highspeed between devices with a large number of pins. The embodimentsinclude features such as multiple clock groups where multiple clockcopies are transmitted and a limited number of data pins are associatedwith each clock signal, a deskew unit that aligns data sampled fromdifferent clock groups to a core clock, a clock generation system forforming clean copies of core clocks that have low jitter and noise, etc.

In the following embodiments a synchronous unidirectional parallelinterface is described where the interface includes data-pins andclock-pins. All the information is carried on data-pins and theclock-pins toggle at a fixed frequency. This clock is referred to as theSSPL-clock. The receiver uses this clock to sample data received on thedata-pins. The set of bits transferred in a clock cycle-time is referredto as a data-word and the link supports the transfer of a continuousstream of data-words, one in each clock cycle-time.

FIG. 1 is a functional timing diagram of the SSPL of this embodimentwhere the set of bits transferred in R0 (rising edge of Dclk) and F0(falling edge of Dclk) form a data word. Dclk_(a)H (data clock high) andDclk_(a)L (data clock low) refer to the high and low edges of the SSPLclock pair and D_(a)[N:0] is the group of data pins sampled by usingclock pair Dclk_(a)H and Dclk_(a)L. In various embodiments, data can betransferred on rising and falling edge of a single clock, on the risingedges of two complementary clocks, or on the falling edges of twocomplementary clocks.

In the following, the group of N+1 data pins, referred to as D_(i)[N:0]is associated with a pair of clock pins, Dclk_(i)H & Dclk_(i)L. Todescribe multiple sets of such data and clock pins the letters a, b, andso on are used to replace the subscript “i”, e.g., D_(a)[N:0] associatedwith Dclk_(a)H & Dclk_(a)L, D_(b)[N:0] associated with Dclk_(b)H &Dclk_(b)L and so on.

In this embodiment the following design choices are made:

-   -   The SSPL uses source-synchronous clocking, i.e. data-pins and        clock-pins are driven from the same source device to minimize        skew between the data-pins and clock-pins.    -   Each data-pin of the SSPL transfers 2 bits of information in one        cycle-time. Thus the maximum frequency of data-pins is the same        as that of the clock-pins.    -   Data transition is center-aligned with the clock edges.    -   All data- and clock-pins are single-ended.    -   Clock pins occur in pairs (of opposite phase).

However, a dual clock signal utilizing either differential orcomplementary logic, can be utilized as is known in the art.

FIGS. 2 and 3 depict the transmitter interface model and receiverinterface model (SSPLrx). The transmitter and receiver each include alogic and array core that utilizes a transmitter core clock (TCC) andreceiver core-clock (RCC) respectively. The transmitter clock (Tclk) isderived from TCC and is transmitted along with the data as asource-synchronous clock. The source synchronous clock is received atthe Rclk inputs of the receiver interface. The phase relationshipbetween Rclk and RCC is indeterminate. The receiver interface samplesdata from the SSPL and transfers it to the receiver core-clock domain.

The D_(a)In[N:0] input of the SSPLrx module is sampled with RclkaHIn andRclk_(a)LIn clocks. The resultant data is output on D_(a)ROut[N:0](rising edge) and D_(a)FOut[N:0] (falling edge) outputs in the RCC clockdomain. After initialization, the module continuously samples the inputand produces output. The rcvRst input, which is activated by an inputpin and/or controlled through a programmable register, initializes themodule

The design of the SSPLrx module does not depend on phase comparisonbetween the source-synchronous clocks and the receiver core-clock. Suchphase difference may change during device operation and causeclock-slip. Therefore, the SSPLrx design uses a synchronizationtechnique that does not depend on phase comparison and is described ingreater detail below.

As depicted in FIG. 3, data from the SSPL is clocked in using Rclk andclocked out using RCC.

A technique for synthesizing clean (low-jitter) transmitter and receivercore clocks, where jitter refers to the uncertainty, or variability, ofwaveform timing, will now be described with reference to FIG. 4.

FIG. 4 shows the SSPL clock distribution technique. The transmittingdevice uses logic that runs at the SSPL-clock frequency or at doublethat frequency. The clock used for this logic is referred to as thetransmitter core-clock. The transmitter core-clock is synthesized from aclean (low-jitter) system clock input to the transmitter device. Theleaf of the clock-tree of the transmitter core-clock may be phase-lockedto the system clock input and the SSPL-clock is derived from thetransmitter core clock.

The receiving device uses logic that runs at the SSPL clock-frequency orat double that frequency. The clock used for this logic is referred toas the receiver core-clock. The receiver core-clock is synthesized froma clean (low-jitter) system clock input to the receiving device. Theleaf of the clock tree of the receiver core-clock may be phase-locked tothe system clock input.

In this embodiment, the system clock inputs to the transmitting andreceiving devices are of the same frequency as the SSPL clock and arecopies of a clock from the same source. Also, the transmitter andreceiver core-clocks may be phase-locked to the system clock inputs sothat the transmitter and receiver clock-tree delays have no effect onthe phase difference between the transmitter- and receiver-core clock.Alternatively, the transmitter and receiver core clocks can havedifferent frequencies.

An embodiment that utilizes multiple source-synchronous clock groupswill now be described with reference to FIG. 5.

In this embodiment, the number of data-pins associated with a pair ofclock pins is limited to between 18 and 20. When bandwidth requirementof the link requires a large number of data-pins, multiple copies ofclocks are used. Each pair of clock-pins and the associated data-pinsare referred to as a clock-group.

The pins of a clock-group are located physically close to each other inboth the transmitting and the receiving device. The transmitter and thetraces are carefully designed to minimize skew within a clock group.Though the clocks carried on these pins are derived from the samesource, the skew between the clock copies in the different clock-groupsmay be substantial at the receiver interface.

FIG. 5 depicts an SSPL with two clock groups, referred to as “a” and“b”. By dividing the wide bus into source-synchronous clock groups thewide bus is effectively divided up into a series of smaller buses toreduce skew and allow for higher clock speeds.

However, as described above, the different copies of the clock, andassociated data signals, in each clock-group may be skewed relative toeach other when they arrive at the receiver interface. A system forremoving the skew between the signal groups sampled by differentreceived copies of the transmit clock will now be described.

FIG. 6 depicts an embodiment where the receiving device uses a separatecopy of the receiver interface module for each clock-group. A deskewlogic block is also depicted which aligns the data output from themultiple receiver interfaces and presents it to the receiver core.

The output of the SSPLrx modules in the different receiver interfacesmay be skewed with respect to each other due to:

-   -   Skew between copies of SSPL-clocks    -   Skew between rcvRst inputs to SSPLrx    -   Skew between RCC inputs to SSPLrx

In this embodiment, the timing budget limits the maximum skew betweenthe outputs of the different receiver interface modules to one RCCperiod. Therefore, data from clock-groups that arrive early may need tobe delayed by one RCC period in order to align with data fromclock-groups that arrive late. However, the invention is not limited bythis constraint and the skew between clock groups may be less than,equal to, or greater than the clock period.

In different embodiments the deskew logic may be:

-   -   Integrated with receiver interface,    -   Integrated with receiver core,    -   Implemented as a separate module.

As depicted in FIGS. 1 and 2, data is clocked on the rising and fallingedges of the transmit clock. In this embodiment the period of Tclk andRCC are the same and the first data frame is clocked on TclkH.Therefore, in the example of two clock groups, “a” and “b”, skewed byone RCC clock cycle where clock-group a is delayed relative to clockgroup “b”, the data sampled with Rclk_(a)H could arrive after the datasampled with Rclk_(b)L. The deskewing logic correctly aligns the datapresented to the receiver core.

A protocol at device initialization is used to align the edges of thedifferent copies of the SSPL-clock at the receiver. One data pin of eachSSPL clock group is used for this purpose and is referred to as theSSPL-Init pin.

Initially the transmitting device drives ‘0’ on the SSPL-Init pin of allclock-groups. This is called the initial-value. Then it drives ‘1’ onthe SSPL-Init pin of all the clock-groups (e.g. D_(a)[0] and D_(b)[0] inFIG. 5) simultaneously for one SSPL clock period. This value is calledthe initialization-pattern. The sequence of driving the initial-valuefollowed by the initialization-pattern is referred to as the trainingsequence. The receiving device detects the transition from initial-valueto the initialization-pattern to deskew the data sampled from differentclock-groups as described in more detail below.

A first embodiment of the deskew integrated with the receiver interfacewill now be described with reference to FIG. 7 and FIG. 8.

FIG. 7 is a detailed schematic diagram of an embodiment of the SSPLrxthat supports deskewing. FIG. 7 depicts a Deskew State Machine thatgenerates a FIFO Write Restart (WrRst) signal, two FIFOs for bufferingdata received on the rising and falling edges of Rclk, an M stagesynchronizer, and a RdyCtrl block to generate the FIFO Read Restart(RdRst) signal. The data is held for deskewing in the FIFOs inside theSSPLrx module. The delay between writing the first data to theRclk_(a)LIn-clocked FIFO and reading the same data is established duringdevice initialization and by the delay through synchronizer. In themodel represented in FIG. 7, the synchronizer uses ‘M’ stages of flopsclocked with RCC. For other synchronizer structures and core-clockfrequencies, the delay through the synchronizer will be different.Depending on that and the timing budget, it may be necessary to haveadditional delays and/or flops to generate ready#A.

The depth of the FIFOs must be such that the output data is held validfor sufficient time before that entry in the FIFO is overwritten withnew data. The maximum time that the write-clock can advance and themaximum synchronizer-delay is factored into deciding the FIFO depth.

After device initialization, the deskew state machine drives the WrRstinput of the FIFO high to hold the write-pointer of these FIFOs in theinitial state. After detecting the initialization sequence, the deskewstate machine drives WrRst low and allows the write-pointer to advance.Thus, the write-pointer and ready# signals in all the SSPLrx modules arecontrolled by the transmitter interface through the initializationsequence.

In this embodiment, in order to avoid putting extra load on D_(a)In[0],the DO[0] input of the deskew state machine is driven from the flop inthe FIFO that samples D_(a)In[0] during device initialization. This flopin the FIFO must not be held under reset in order to allow propagationof the D_(a)In[0] value when WrRst is high.

The output of the deskew logic (Q) is initialized to LOW when the Rstsignal is asserted. It then is driven to HIGH and remains HIGH when thetraining sequence (D[0]=1) is received.

FIG. 7 depicts the deskew logic for clock group “a”. This logic isrepeated for each clock group. As described above, during theinitialization signal a logic “0” signal is driven on the DO[0] signalof each clock group and these signals may be skewed relative to eachother. Thus, the time of assertion of WrRst signal will vary from clockgroup to clock group depending on the amount of relative skew betweenthe clock groups.

FIG. 8 depicts an embodiment of the FIFO, depicted in FIG. 7, thatsupports separate read- and write-clocks. The FIFO is deep enough toabsorb the skew between the different clock groups. The write-clock isused to write data to the FIFO and advance the write-pointer inside theFIFO. The read-clock is used to sample data from the FIFO and advancethe read-pointer inside the FIFO, if (incrRdPtr==1). In implementationswhere the frequency of the RdClk is a multiple of the frequency ofWrClk, the input incrRdPtr is used to control the increment step of theread-pointer. This input is tied to ‘I’ where both clocks are the samefrequency.

The WrRst and RdRst to the FIFO counters are active low. When the WrRstinput to the FIFO is high, the counter used for the write-pointer insidethe FIFO is held in its initial state. Similarly, when the RdRst inputto the FIFO is high, the counter used for the read-pointer inside theFIFO is held in its initial state. The FIFO uses edge-triggered D-flopswith enable (EN) as storage elements. The data input is sampled by oneset of the flops even when WrRst is driven low.

The use of the deskew logic to deskew data between multiple clock groupswill now be described in more detail with reference to FIGS. 6, 7 and 8.As described above with reference to FIG. 7, the WrRst signal is notdriven low until the transition from the initial value to theinitialization pattern of the training sequence is detected. The WrRstsignal is then input to the synchronizer and is output as the Ready#signal after a fixed delay.

The amount of this fixed delay is controlled by a value encoded in theDeskewStateDelay[d:0] signal. The actual implementation of the system ona chip may require additional flops to be added between the output ofthe M stage Synchronizer and the SSPLrx macros thereby insertingadditional delay after FIFO initialization requiring more FIFO depth. Inthis embodiment the DeskewStateDelay[d:0] signal is used to program theDeskew State machine to delay the assertion of WrRst to the input of theSynchronizer. This delayable WrRst signal is denominated as theRdRstSync signal in FIG. 7.

In this embodiment, the Deskew Logic of FIG. 6 includes a one-bit deskewstate machine (FIG. 7) for clock group “a” and another one-bit deskewstate machine for clock group “b”. In this example it is assumed thatthe data of clock group “a” are delayed relative to the data of clockgroup “b”. As described above, WrRst will be driven low by each one-bitstate machine when the transition of the initial value of trainingsequence is detected. Thus, referring to FIG. 8, the FIFO starts writingthe received data when WrRst is driven low. In this case, because of theskew between clock groups “a” and “b”, the signal WrRstb will be drivenlow before the signal WrRsta and data from clock group “b” will be readinto the FIFO prior to data from group “a”.

The RdRstSync signal is driven low either simultaneously with WrRstA orafter a fixed delay. The M stage Synchronizer is driven by the internalclock signal RCC and forms the boundary between the receive clock domainand the internal clock domain. The Ready signal is synchronized to RCC.

In the example currently being described, the signal Ready_(b) will bedriven low before the signal Ready_(a). However, in this case all FIFORead Counters receive a RdRst signal which is in the form of the logicalOR of all the Ready_(i) signals driven low by the individual one-bitdeskew state machines so that no data will be read from the FIFOs untilthe initial data of all the clock groups has been written to acorresponding FIFO. The RdRst signal can thus be used to keep all theFIFO read pointers “on hold” until all the groups are initialized andready to read out data. Accordingly, the RdRst signal will be driven lowonly after both Ready_(a) and Ready_(b) signals are driven low and thefirst data received on both clock group signals will be read insynchronism from the FIFO when RdRst is driven low and the skew betweenthe clock groups is removed.

In an alternative embodiment, the RdRst signal is derived from thelogical AND of the Ready_(a) and Ready_(b) signals delayed by S clocks,where an interval of S clock delays is greater than the maximum budgetedskew interval between clock groups. In this case, the RdRst signal willbe driven low if any of the Ready signals are driven low. This removes apossible fault where one of the Ready signals getting stuck could hangup the receiver. However, this option adds a delay since the Readysignal will not be driven low until after the S clock delayed expires.

FIG. 9 shows a configuration where multiple transmitting devices areconnected in parallel. In response to identical commands sent by thereceiving device on Cmda and Cmdb bus, U1a and U1b drives differentparts of the same data-frame on D_(a)[N:0] and D_(b)[N:0]. The receivingdevice needs to align the data from U1a and U1b received from twodifferent devices. One or more bits on the SSPL must provide framinginformation for data-frame driven by the transmitting device. Thereceiving device can use the framing information to align data fromdifferent transmitting devices.

In case the transmitting device has the same latency for all commands,this embodiment provides a mechanism to align data from two transmittingdevices. U1a and U1b drive the initialization pattern in response to a“send-initialization-pattern” command from the receiving device. Duringinitialization, the receiving device sends this command simultaneouslyto both U1_(a) and U1_(b). The receiving device then uses theinitialization patterns from the two devices to deskew the data fromU1_(a) and U1_(b) (similarly to how a receiving device deskews data fromtwo clock-groups as described with reference to FIGS. 7-8).

Due to skew between the core-clocks Of U1_(a) and U1_(b), the skewbetween the source-synchronous clocks from two devices can be largerthan the skew between two clock-groups from the same device. The SSPL isalso used for CmdA and CmdB buses. The synchronizers in the SSPLreceiver interface in U1_(a) and U1_(b) may skew the commands by anadditional period. If the latency of U1_(a) and U1_(b) is unequal, thereceiving device needs to support additional skew amounting to thelatency-difference between U1_(a) and U1_(b).

The data in the different SSPLrx modules in the receiving device may beskewed with respect to each other due to:

-   -   Skew between Cmd_(a) and Cmd_(b) bus clocks    -   Skew between TCC_(a) and TCC_(b)    -   Phase-error and jitter of frequency-synthesizer inside U1_(a)        and U1_(b)    -   Delay difference between transmitter cores in U1_(a) and U1_(b)    -   Delay difference between source-synchronous clocks from U1_(a)        and U1_(b)    -   Skew between RstIn inputs to SSPLrx modules in the receiving        device    -   Jitter and skew of RCC in receiving device

In another embodiment the deskew logic deskews independent buses whilemaintaining the temporal relationship between the data on the buses. Forexample, FIG. 10 depicts two chips: Tx (transmitter) and Rx (receiver).There are two busses going from T to R, labeled M and N where M has 3clock groups M1, M2 and M3, and N has 2 clock groups N1 and N2.

The bus M is self-contained and independent of N, meaning all thenecessary signaling is present within M so that the core logic in Tx cantransfer data through M to core logic in R. Likewise, N isself-contained and independent of M. This means that buses M and N canindependently carry two “streams” of data from Tx to Rx. However, thereare applications where there is a “temporal” relationship between thedata on M and N. For example, an element of data on M (like a packet)may precede an element of data on N (for example, some informationrelated to the previous packet) by a fixed number of core clock periods.The following example illustrates this:

-   -   M: XXXXXXXX1234XX56XX7XXXX . . .    -   N: XXXXXXXXXXXABCDXXEFXGXXX . . .

The data “ABCD” on N follows the data “1234” on M by two clocks (in thetransmitter core logic domain). The following is an example of whatcould happen when these busses go through the SSPL. Assuming M1 has azero skew, and with respect to M,

-   -   skew(M1, M2)=2    -   skew(M1, M3)=4    -   skew(M1, N1)=4    -   skew(M1, N2)=5

If M1-M3 and N1-N2 are treated as two busses and grouped separately,then:

-   -   the “M” set deskews M1, M2 and M3, and the total delay on M bus        on the receiver side would be 4 (due to M3), and    -   the “N” set deskews N1 and N2, and the total delay on N bus on        the receiver side would be 5 (due to N2)    -   this means that the receiver gets data on the N bus later with        respect to data on M.

So, the SSPL skews in the “physical layer” (board, IO, etc.) havealtered the temporal relationship between data on M and N and thereceiver core logic has to have additional logic to handle this.

Instead, in this embodiment M1-M2 and N1-N2 are treated as a single busin the SSPL domain, so that the total delay on all the groups (M1, M2,M3, N1, N2) would be 5 (due to N2), the temporal relationship betweenthe data on M and N is preserved, and the transmitter core and receivercore remain in sync with respect to M and N regardless of physical layerskews to nicely decouple the logic layer protocols from the physicallayer protocols and keep the core logic design “clean” and independentof SSPL skews.

The invention has now been described with reference to the preferredembodiments. Alternatives and substitutions will now be apparent topersons of skill in the art. For example, the logic levels describedabove are arbitrary and may be varied as is known in the art. Further,the number of data lines in a clock group depends on system design andtiming budgets. Accordingly, it is not intended to limit the inventionexcept as provided by the appended claims.

1. A system comprising: a transmitter comprising: Tx core logicutilizing a transmitter core clock (TCC); a transmitter interface moduleadapted to be coupled to a parallel bus, with the parallel bus includingfirst and second clock groups, with each clock group including a set ofdata lines and an associated clock line, with the data lines andassociated clock line in a clock group located physically close to oneanother to minimize skew between the data signals and the clock signalcarried on the lines of the clock group, with the transmitter interfacemodule configured to clock data onto the data lines of the first andsecond clock groups in synchronism with first and second transmit clockcopies, respectively, derived from the TCC, and with the transmitterinterface further configured to transmit a single initial value having afirst logic value only on a selected data line of each of the first andsecond clock groups and subsequently to simultaneously transmit a singletraining sequence value having a second logic value different from thefirst logic value only on the selected data line in each of the firstand second clock groups, where the single initial value and singletraining sequence value are transmitted only once when the transmitteris initialized; a receiver comprising: Rx core logic utilizing areceiver core clock (RCC); a first deskew state machine adapted toreceive the first transmit clock copy and the initial value and trainingsequence value on the selected data line of the first clock group, withthe first deskew state machine adapted to assert a first write restartsignal when the training sequence value is received; a firstsynchronizer, clocked by the RCC, coupled to receive the first writerestart signal and assert a first read ready signal after a delay of aselected number of RCC cycles from the assertion of the first writerestart signal; a second deskew state machine, coupled to receive thesecond transmit clock copy and the initial value and training sequencevalue on the selected data line of the second clock group, with thesecond deskew state machine adapted to assert a second write restartsignal when the training sequence value is received where the first andsecond write restart signals will be asserted at different times ifthere is skew between the first and second associated clock groups; asecond synchronizer, clocked by RCC, coupled to receive the second writerestart signal and assert a second read ready signal after the delay ofthe selected number of RCC cycles from the assertion of the second writerestart signal; read restart logic, coupled to receive the first andsecond read ready signals, adapted to assert a read restart signal whenboth the first and second read ready signals are asserted, whereassertion of the read restart signal indicates that both the first andsecond associated clock groups are initialized and ready to read outdata; a first FIFO having a data input adapted to sample data from thedata lines of the first clock group in synchronism with the firsttransmit clock copy received on the clock line of the first clock group,with the first FIFO including write logic, coupled to the first deskewstate machine to receive the first write restart signal, with the writelogic configured to start writing data to the first FIFO when the firstwrite restart signal is asserted and the first FIFO including readlogic, coupled to the read restart logic to receive the read restartsignal, and with the read logic configured to start reading data fromthe first FIFO when the read restart signal is asserted; and a secondFIFO having a data input adapted to sample data from the data lines ofthe second clock group in synchronism with the second transmit clockcopy received on the clock line of the second clock group, with thesecond FIFO including write logic, coupled to the second deskew statemachine receive the second write restart signal, with the write logicconfigured to start writing data to the second FIFO when the secondwrite restart signal is asserted and with the second FIFO including readlogic, coupled to the read restart logic to receive the read restartsignal, with the read logic configured to start reading data from thesecond FIFO when the read restart signal is asserted where writing datato the first and second FIFOs will begin at different times if the thereis skew between the first and second clock groups but reading data willbegin at the same time.
 2. The system of claim 1 where data is clockedon the rising and falling edges of the transmit clock copy and apositive and negative transmit clock copy are carried on two clock linesof each clock group.
 3. The system of claim 1 where the transmitter coreclock and receiver core clock are derived from a common clock sourcesignal to reduce jitter.
 4. The system of claim 1 where TCC and RCC arenot equal.
 5. A method comprising: asserting a first write restartsignal when a single training sequence value having a second logic valueis received after the receipt of a single initial value having a firstlogic value is received only on a selected data line of a first clockgroup of a parallel bus, with the first clock group including data linesand an associated clock signal line and with the single trainingseguence value and single initial value received only once uponinitialization of a transmitter; starting sampling data on data lines ofthe first clock group into a first FIFO in synchronism with first FIFOwrite signals when the first write restart signal is asserted; assertinga second write restart signal when a training sequence value having asecond logic value is received after the receipt of an initial valuehaving a first logic value is received on a selected data line of asecond clock group of a parallel bus, with the second clock groupincluding data lines and an associated clock signal line with the datalines and associated clock line in a clock group located physicallyclose to one another to minimize skew between the data signals and theclock signal carried on the lines of the clock group; starting samplingdata on data lines of the second clock group into a second FIFO insynchronism with second FIFO write signals when the second write restartsignal is asserted; delaying the first write ready signal by a fixednumber of receiver core clock (RCC) cycles to form a first ready signal;delaying the second write ready signal by the fixed number of receivercore clock (RCC) cycles to form a second ready signal; asserting a thirdread restart signal when both the first and second write ready signalsare asserted, where assertion of the read restart signal indicates thatboth the first and second associated clock groups are initialized andready to read out data; starting reading data from the first FIFO insynchronism with the first FIFO read signals when the read restartsignal is asserted; starting reading data from the second FIFO insynchronism with the second FIFO read signals when the read restartsignal is asserted where writing data to the first and second FIFOs willbegin at different times if the there is skew between the first andsecond clock groups but reading data will begin at the same time.
 6. Asystem comprising: means for asserting a first write restart signal whena single training sequence value having a second logic value is receivedafter the receipt of a single initial value having a first logic valueis received only on a selected data line of a first clock group of aparallel bus, with the first clock group including data lines and anassociated clock signal line and with the single training sequence valueand single initial value received only once upon initialization of atransmitter; means for starting sampling data on data lines of the firstclock group into a first FIFO in synchronism with first FIFO writesignals when the first write restart signal is asserted; means forasserting a second write restart signal when a training sequence valuehaving a second logic value is received after the receipt of an initialvalue having a first logic value is received on a selected data line ofa second clock group of a parallel bus, with the second clock groupincluding data lines and an associated clock signal line with the datalines and associated clock line in a clock group located physicallyclose to one another to minimize skew between the data signals and theclock signal carried on the lines of the clock group; means for startingsampling data on data lines of the second clock group into a second FIFOin synchronism with second FIFO write signals when the second writerestart signal is asserted; means for delaying the first write readysignal by a fixed number of receiver core clock (RCC) cycles to form afirst ready signal; means for delaying the second write ready signal bythe fixed number of receiver core clock (RCC) cycles to form a secondready signal; means for asserting a third read restart signal when boththe first and second write ready signals are asserted, where assertionof the read restart signal indicates that both the first and secondassociated clock groups are initialized and ready to read out data;means for starting reading data from the first FIFO in synchronism withthe first FIFO read signals when the read restart signal is asserted;means for starting reading data from the second FIFO in synchronism withthe second FIFO read signals when the read restart signal is assertedwhere writing data to the first and second FIFOs will begin at differenttimes if the there is skew between the first and second clock groups butreading data will begin at the same time.